Introduction

AdhereR is an R package that aims to facilitate the computing of adherence from EHD, as well as the transparent reporting of the chosen calculations. It contains a set of R S3 classes and functions that compute, summarize and plot various estimates of adherence.

This tutorial aims to introduce researchers to the principles of EHD preparation required to estimate adherence with the AdhereR package. It uses example data to illustrate the various decisions required and their impact on estimates, starting with the visualization of medication events, computation of persistence (treatment episode length), and computation of adherence.

Please contact us with suggestions, bug reports, comments (or even just to share your experiences using the package) either by e-mail (to Dan ddediu@gmail.com, Alexandra alexadima@gmail.com or Samuel (samuel.allemann@gmx.ch)) or using GitHub’s reporting mechanism at our repository https://github.com/ddediu/AdhereR, which contains the full source code of the package.

Data preparation and example dataset

AdhereR requires a dataset of medication events over a FUW of sufficient length in relation to the recommended treatment duration. To our knowledge, no research has been performed to date on the relationship between FUW length and recommended treatment duration. AdhereR offers the opportunity for answering such methodological questions, but we would hypothesize that the FUW duration also depends on the duration of medication events (shorter durations would allow shorter FUW windows to be informative).

The minimum necessary dataset includes 3 variables for each medication event: patient unique identifier, event date, and duration. Daily dosage and medication type are optional.AdhereR is thus designed to use datasets that have already been extracted from EHD and prepared for calculation. The preliminary data preparation depend to a large extent on the specific database used and the type of medication and research design. Several general guidelines can be consulted (Arnet et al., 2016; Peterson et al., 2007), as well as database-specific documentation. In essence, these steps should entail:

Oftentimes, datasets can be large with hundreds of thousands or even millions of rows. Provided that working memory is big enough to hold the entire dataset, R can handle it. Manipulating these large datasets with the standard R libraries can be a hassle. We will use the package data.table, which usually requires less memory and is faster than the standard data.frame format.

Suppose you have a dataset with the following information:

Table 1 shows the medication events of one example patient: 32 medication events related to 8 medications.

# Load the AdhereR and data.table libraries (and install if not already installed):
if (!require("AdhereR")) install.packages("AdhereR")
## Loading required package: AdhereR
if (!require("data.table")) install.packages("data.table")
## Loading required package: data.table
# Load example dataset
disp_events <- fread("./AdhereR Tutorial/data/example_disp_events.csv")
# Display the first patient as pretty markdown table:
knitr::kable(disp_events[PATIENT.ID == 42], caption = "<a name=\"Table-1\"></a>**Table 1.** Medication events for one example patient");
Table 1. Medication events for one example patient
PATIENT.ID DATE.DISP ATC.CODE DOSE QUANTITY
42 14.01.2015 A02BC05 40 MG 28
42 07.03.2015 A02BC05 20 MG 14
42 03.10.2014 A02BC05 40 MG 28
42 03.12.2014 A02BC05 40 MG 28
42 04.08.2015 A02BC05 40 MG 28
42 09.02.2016 A02BC05 40 MG 28
42 21.04.2016 A02BC05 40 MG 28
42 09.02.2016 A11CC05 1e+05 UI 1
42 09.10.2014 J01FG01 500 MG 96
42 21.10.2014 J01EE01 800 MG 50
42 09.10.2014 R03AC12 50 MICROG 60
42 14.01.2015 J01EE01 800 MG 30
42 07.03.2015 J01EE01 800 MG 10
42 02.05.2015 J01EE01 800 MG 20
42 15.09.2015 J01EE01 800 MG 20
42 11.12.2015 J01CR02 500 MG 24
42 09.10.2014 A09AA02 25000 UI 60
42 24.02.2015 A09AA02 25000 UI 60
42 26.05.2015 A09AA02 25000 UI 60
42 04.09.2015 A09AA02 25000 UI 60
42 03.06.2016 A09AA02 25000 UI 180
42 12.02.2015 A09AA02 12000 UI 60
42 02.05.2015 A09AA02 12000 UI 120
42 10.12.2014 A09AA02 12000 UI 120
42 19.01.2015 J01CR02 1000 MG 24
42 04.09.2015 A02BC02 20 MG 28
42 03.06.2016 A02BC02 20 MG 28
42 09.07.2016 A02BC02 20 MG 28
42 13.04.2015 A02BC05 20 MG 28
42 20.04.2015 A02BC05 20 MG 28
42 05.08.2014 A02BC05 40 MG 28
42 26.05.2015 A02BC05 40 MG 28

Data cleaning

First, we have to make sure that the data is in the right format. We can use the function str() to check the format of our variables and summary() for a first plausibility check.

# Check format of variables:
str(disp_events)

Classes ‘data.table’ and ‘data.frame’: 366 obs. of 5 variables: $ PATIENT.ID: int 42 42 42 42 42 42 42 42 42 42 … $ DATE.DISP : chr “14.01.2015” “07.03.2015” “03.10.2014” “03.12.2014” … $ ATC.CODE : chr “A02BC05” “A02BC05” “A02BC05” “A02BC05” … $ DOSE : chr “40 MG” “20 MG” “40 MG” “40 MG” … $ QUANTITY : int 28 14 28 28 28 28 28 1 96 50 … - attr(, “.internal.selfref”)= - attr(, “index”)= atomic
..- attr(*, “__PATIENT.ID“)= int

We can see that the DATE.DISP column is in CHARACTER instead of DATE format, and the DOSE is in CHARACTER format, too, because the unit is appended to it.

We convert the DATE.DISPto the appropriate format and extract the numeric part and the characters from the DOSEvariable into separate variables DOSE.numand UNIT. In addition, we convert the ATC-Code ATC.CODE to a factor variable. Let’s also convert the DOSE.num variable to NUMERIC and the UNIT variable to FACTOR to make sure that we have the same unit for all rows.

# Convert DATE to the DATE format and split DOSE into two variables:
disp_events[,`:=` (DATE.DISP = as.Date(DATE.DISP, format = "%d.%m.%Y"), #convert Date to date format
                   ATC.CODE = as.factor(ATC.CODE) #convert ATC-Code to factor variable
                   )]

disp_events[,c("DOSE.num", "UNIT"):= tstrsplit(DOSE, " ")] #split Dose on whitespace
disp_events[,DOSE := NULL]

# add the code to convert DOSE.num to numeric and UNIT to factor:

####your code here####
disp_events[,`:=` (DOSE.num = as.numeric(DOSE.num), #convert DOSE.num to numeric
                   UNIT = as.factor(UNIT) #convert UNIT to factor variable
                   )]

# Check format of variables:
str(disp_events)

Classes ‘data.table’ and ‘data.frame’: 366 obs. of 6 variables: $ PATIENT.ID: int 42 42 42 42 42 42 42 42 42 42 … $ DATE.DISP : Date, format: “2015-01-14” “2015-03-07” … $ ATC.CODE : Factor w/ 31 levels “A02BC02”,“A02BC05”,..: 2 2 2 2 2 2 2 5 15 14 … $ QUANTITY : int 28 14 28 28 28 28 28 1 96 50 … $ DOSE.num : num 4e+01 2e+01 4e+01 4e+01 4e+01 4e+01 4e+01 1e+05 5e+02 8e+02 … $ UNIT : Factor w/ 4 levels “MCG”,“MG”,“MICROG”,..: 2 2 2 2 2 2 2 4 2 2 … - attr(, “.internal.selfref”)= - attr(, “index”)= atomic
..- attr(*, “__PATIENT.ID“)= int

# Check summary of variables:
summary(disp_events)

PATIENT.ID DATE.DISP ATC.CODE QUANTITY
Min. :42.00 Min. :2014-07-01 A09AA02: 50 Min. : 1.00
1st Qu.:45.00 1st Qu.:2015-01-07 A02BC05: 47 1st Qu.: 28.00
Median :45.00 Median :2015-05-12 A11HA03: 35 Median : 60.00
Mean :44.91 Mean :2015-06-19 A05AA02: 23 Mean : 99.83
3rd Qu.:46.00 3rd Qu.:2016-01-08 J02AC02: 23 3rd Qu.:120.00
Max. :47.00 Max. :2016-07-09 R03DX05: 19 Max. :672.00
(Other):169
DOSE.num UNIT
Min. : 2.0 MCG : 3
1st Qu.: 56.2 MG :236
Median : 200.0 MICROG: 43
Mean : 17556.4 UI : 84
3rd Qu.: 800.0
Max. :1000000.0

Now that the data is in the right format, we can see if there are any implausible or missing data from the summary. We can see that all the dates appear to be from the years 2014-2016, which corresponds with our intended follow-up window. There are 31 different medications (as seen from the 31 factor levels of the ATC.CODEvariable). There are 4 different units in the UNIT variable: MG, MCG, MICROG, and UI. MCG and MICROG both refer to the same unit, microgrammes, so there should be only one version of this unit. We could change the original data and replace all instances of MCG with MICROG, but one of the data cleaning principles is to never change the original data. Instead, we will modify the data transparently and reproducibly in our script. If there are a lot of modificiations, this could be in a separate file that contains only the modifications.

# Assign *MCG* and *MICROG* to the same factor level
levels(disp_events$UNIT) <- list(MICROG=c("MCG", "MICROG"), MG="MG", UI="UI")

To calculate CMAs, AdhereR requires a DURATION for each dispensing event, but we only have the quantity. We could assume that patients need to administer one unit per day and use the QUANTITY variable, but for the medication of interest, this is clearly not appropriate. Sometimes, standard doses, e.g. WHO’s ‘Defined Daily Dose’ or other assumptions may be appropriate in some instances, but might introduce bias in other situations. For this example, we have a second database where the prescribed dosage for each medication per patient is recorded:

  • patient unique identifier (PATIENT.ID),
  • event date (DATE.PRESC; date of prescription, in the “mm/dd/yyyy” format),
  • ATC code (ATC.CODE; alpha-numeric code to identify the medication)
  • dosage (DAILY.DOSE; prescribed dose of the medication per day), and
  • unit (UNIT; unit of the prescribed dose).
# Load example prescription data:

load("./AdhereR Tutorial/data/example_presc_events.RData")

str(presc_events)

Classes ‘data.table’ and ‘data.frame’: 47 obs. of 5 variables: $ PATIENT.ID: int 43 45 46 45 46 47 43 46 47 45 … $ DATE.PRESC: Date, format: “2014-01-01” “2014-01-01” … $ ATC.CODE : chr “J01CR02” “J02AA01” “J01DF01” “J01DC02” … $ DAILY.DOSE: num 1300 12 75 500 1000 1000 1000 1000 1000 1000000 … $ UNIT : Factor w/ 3 levels “MG”,“MICROG”,..: 1 1 1 1 1 1 3 3 3 3 … - attr(*, “.internal.selfref”)=

summary(presc_events)

PATIENT.ID DATE.PRESC ATC.CODE
Min. :42.00 Min. :2014-01-01 Length:47
1st Qu.:45.00 1st Qu.:2014-01-01 Class :character
Median :45.00 Median :2014-01-01 Mode :character
Mean :45.21 Mean :2014-01-01
3rd Qu.:46.00 3rd Qu.:2014-01-01
Max. :47.00 Max. :2014-01-01
DAILY.DOSE UNIT
Min. : 0.1 MG :27
1st Qu.: 100.0 MICROG: 8
Median : 500.0 UI :12
Mean : 65161.9
3rd Qu.: 1400.0
Max. :1000000.0

Conveniently, this dataset is already clean and all the data is in the right format. Moreover, there is only one prescription event per medication, which occured before the first dispensing event, so we don’t have to deal with prescription changes during our follow-up period.

We can now merge the two datasets and calculate the duration for each dispensing event. We merge by PATIENT.ID, ATC.CODE code and UNIT to make sure that events are matched correctly. This is why it was necessary to clean up the units: Otherwise, some events might not merge correctly due to mismatches between the units. By default, the merge function only includes rows where the ID-variable are present in both instances. This means that we only capture medications that were prescribed and at least once dispensed during the follow-up period. If we want to capture all events, we can specify all = TRUE in the function arguments:

# Merge dispensing and prescription data:
med_events <- merge(disp_events, presc_events, by = c("PATIENT.ID", "ATC.CODE", "UNIT"), all = TRUE, sort = FALSE)

# Calculate the supply duration

med_events[,DURATION := (DOSE.num*QUANTITY)/DAILY.DOSE]

summary(med_events)

PATIENT.ID ATC.CODE UNIT DATE.DISP
Min. :42.00 Length:370 MICROG: 48 Min. :2014-07-01
1st Qu.:45.00 Class :character MG :237 1st Qu.:2015-01-07
Median :45.00 Mode :character UI : 85 Median :2015-05-12
Mean :44.92 Mean :2015-06-19
3rd Qu.:46.00 3rd Qu.:2016-01-08
Max. :47.00 Max. :2016-07-09
NA’s :4
QUANTITY DOSE.num DATE.PRESC
Min. : 1.00 Min. : 2.0 Min. :2014-01-01
1st Qu.: 28.00 1st Qu.: 56.2 1st Qu.:2014-01-01
Median : 60.00 Median : 200.0 Median :2014-01-01
Mean : 99.83 Mean : 17556.4 Mean :2014-01-01
3rd Qu.:120.00 3rd Qu.: 800.0 3rd Qu.:2014-01-01
Max. :672.00 Max. :1000000.0 Max. :2014-01-01
NA’s :4 NA’s :4 NA’s :52
DAILY.DOSE DURATION
Min. : 0.1 Min. : 3.00
1st Qu.: 100.0 1st Qu.: 20.00
Median : 800.0 Median : 30.00
Mean : 72616.4 Mean : 35.51
3rd Qu.: 1785.0 3rd Qu.: 30.00
Max. :1000000.0 Max. :461.54
NA’s :52 NA’s :56

When assessing adherence in clinical practice over an extended period of time, unlike in our example, there will be many events affecting adherence estimation, such as prescription changes or hospitalizations. These data are increasingly available for research, but can be tricky to process correctly. In its newest version, AdhereR now offers a function to link dispensing, prescription, and hospitalization data to improve the accuracy of adherence estimation. For each dispensing event, it automatically selects the last prescibed dose to calculate supply duration, checks for prescription changes and hospitalizations during this period, and adjusts the duration accordingly. It requires the following input:

  • x : A data.frame with the dispensing data
  • y : A data.frame with the prescription data
  • z : optional, a data.frame with the hospitalization data
  • ID.var : A character vector of the ID column (identical in all data sources)
  • DATE.PRESC.var : A character vector of the prescription date column (in y)
  • DATE.DISP.var : A character vector of the prescription date column (in x)
  • DATE.format : A character vector of the date format (identical in all data sources)
  • CATEGORY.var : A character vector of the medication identification column (identical for x and y)
  • TOTAL.DOSE.var : A numeric vector of the column with the dispensed dose (in x)
  • DAILY.DOSE.var : A numeric vector of the column with the daily prescribed dose (in y)
  • PRESC.DURATION.var : : optional, A integer vector of the column with the prescription duration in days (in y)
  • UNIT.var : optional, A character vector of the medication unit column (identical for x and y)
  • FORM.var : optional, A character vector of the medication form column (identical for x and y)
  • VISIT.var : optional, A integer vector of the visit number (in y)
  • force.init.presc : logical, default TRUE; should first prescibed dose be used for dispensing events occuring before the first prescription event?
  • force.presc.renew : logical, default TRUE; if a medication has not been prescribed during a prescription event, should its prescription end on this date?
  • consider.dosage.change : logical, default TRUE; should the supply duration be recalculated in case of prescription changes?

Because hospitalization data is optional, we can use this function with our example datasets:

# Merge dispensing and prescription data with AdhereR's medication_match function:

##### Fill in the arguments for the function call #######

In addition to the standard AdhereR columns, the output of the matching function contains some more columns with additional information:

  • FIRST.PRESC : A Date column with the date when the treatment was first prescribed
  • PRESC.START : A Date column with the start date of a prescription episode
  • PRESC.END : A Date column with the end date of a prescription episode. If there is no end date, this will be NA
  • DOSAGE.CHANGE : An integer column with the number of dosage changes considered for a given dispensing event.

Before calculating adherence, we check the results of our matching function for plausibility.

# Check output of matching function

str(NULL)

NULL

summary(NULL)

Length Class Mode 0 NULL NULL ## Visualization of patient records

A first step towards deciding which algorithm is appropriate for these data is to explore medication histories visually. We do this by creating an object of type CMA0 for the two example patients, and plotting it. This type of plots can of course be created for a much bigger subsample of patients and saved as as a JPEG, PNG, TIFF, EPS or PDF file using R’s plotting system for data exploration.

# Create an object "cma0" of the most basic CMA type, "CMA0":
cma0 <- CMA0(data=med.events, # use the two selected patients
             ID.colname="PATIENT_ID", # the name of the column containing the IDs
             event.date.colname="DATE", # the name of the column containing the event date
             event.duration.colname="DURATION", # the name of the column containing the duration
             event.daily.dose.colname="PERDAY", # the name of the column containing the dosage
             medication.class.colname="CATEGORY", # the name of the column containing the category
             followup.window.start=0,  # FUW start in days since earliest event
             observation.window.start=182, # OW start in days since earliest event
             observation.window.duration=365, # OW duration in days
             date.format="%m/%d/%Y"); # date format (mm/dd/yyyy)
# Plot the object (CMA0 shows the actual event data only):
plot(cma0, # the object to plot
     align.all.patients=TRUE); # align all patients for easier comparison
<a name="Figure-1"></a>**Figure 1.** Medication histories - two example patients

Figure 1. Medication histories - two example patients

We can see that patient 76 had an interruption of more than 100 days between the second and third medB supply and several situations of new supply acquired while the previous supply was not exhausted. Patient 37 had shorter gaps between consecutive events, but very little overlap in supplies. For patient 76, the switch to medB happened while the medA supply was still available, then a switch back to medA happened later, at the end of the second year. For patient 37, there was a single medication switch (to medB) without an overlap at that point.

These observations highlight several decision points in calculating persistence and adherence, which need to be informed by the clinical context of the study:

  • what OW is relevant for calculating adherence and persistence? Both patients have been on treatment during the 2 years, they had a substantial number of events of relatively short duration, and variable delays between the end of a supply and the next event. Thus, their adherence might have oscillated substantially during this period. We could compute adherence and/or persistence for the full 2-year period, or consider shorter intervals;
  • is the largest interruption seen in patient 76 an indication of non-persistence, or of lower adherence over that time interval? If the medication is likely to be used rarely despite daily use recommendations, such an interval might indicate a period of low adherence. If usual adherence rates are close to 100% when used, that delay is likely to indicate a treatment gap and needs to be treated as such, and the last 2 events as reinitiation of treatment (new treatment episode);
  • is the switch from medA to medB an indicator of a new treatment episode? If medA and medB are two alternative formulations of the same chemical molecule, there might be clinical arguments for considering them as part of the same treatment episode (e.g. the pharmacist provided an alternative option to a product unavailable at the moment). If they are two distinct drug classes with different mechanisms of action and recommendations of use, it may be more appropriate to consider that patient 76 has had 3 treatment episodes and patient 37 one episode;
  • is it necessary to consider carry-over of oversupply from previous events? For patient 37 this seems to matter very little, as there is little overlap between event durations, but patient 76 has substantial overlaps. If available medication is not likely to be either overused or discarded at every new medication event, it is important to control for carry-over;
  • it is necessary to consider carry-over also when medication changes? Patient 76 has changed from medA to medB while still having a large supply of medA available. Was the patient more likely to discard the remaining medA the moment of receiving medB or finish it before starting the medB supply? If they are two alternative formulations and medB was (for example) given because medA was not in stock at the moment, probably this came with a recommendation to finish the available supply. If they are two distinct drug classes and the switch happens usually after assessment of therapeutic versus side effects, probably this came with a recommendation to stop using medA.

These decisions therefore need to be taken based on a good understanding of the pharmacological properties of the medication studied, and the most plausible clinical decision-making in routine care. This information can be collected from an advisory committee with relevant expertise (e.g. based on consensus protocols), or (even better) qualitative or survey research on the routine practices in prescribing, dispensing and using that specific medication. Of course, this is not always possible – a second-best option (or even complementary option, if consensus is not reached) is to compare systematically the effects of different analysis choices on the hypotheses tested (e.g. as sensitivity analyses).

Persistence – treatment episodes

An essential first decision is to distinguish between persistence with treatment and quality of implementation (once the patient started treatment – which, as explained above, is assumed in situations when we have only one data source of prescribing or dispensing events). The function compute.treatment.episodes() was developed for this purpose. We provide below an example of how this function can be used.

Let’s imagine that medA and medB are two different types of medication, and clinicians in our advisory committee agree that whenever a health care professional changes the type of medication supplied this should be considered as a new treatment episode; we will specify this as setting the parameter medication.change.means.new.treatment.episode to TRUE.

They also agree that a minumum of 6 months (180 days) need to pass after the end of a medication supply (taken as prescribed) without receiving a new supply in order to be reasonably confident that the patient has discontinued/interrupted the treatment – they can conclude this for example based on an approximate calculation considering that specific medication is usually supplied for 1-2 months, daily dosage is usually 2 to 4 pills a day, and patients often use as low as 1/4 of the recommended dose in a given interval. We will specify this as maximum.permissible.gap = 180, and maximum.permissible.gap.unit = "days". (If in another scenario the clinical information we obtain suggests that the permissible gap should depend on the duration of the last supply, for example 6 times that interval should go by before a discontinuation becoming likely, we can specify this as maximum.permissible.gap = 600, and maximum.permissible.gap.unit = "percent".)

We might also have some clinical confirmation that usually people finish existing supply before starting the new one (carryover.within.obs.window = TRUE), but of course only for the same medication if medA and medB are supplied with a recommendation to start a new treatment immediately (carry.only.for.same.medication = TRUE), take the existing supply based on the new dosage recommendations if these change (consider.dosage.change = TRUE).

The rest of the parameters specify the name of the dataset (here ExamplePats), names of the variables in the dataset (here based on the demo dataset, described above), and the FUW (here the whole 2-year window).

# Compute the treatment episodes for the two patients:
ExamplePats <- med.events[med.events$PATIENT_ID %in% c(37, 76), ]

TEs3<- compute.treatment.episodes(ExamplePats,
                                  ID.colname="PATIENT_ID",
                                  event.date.colname="DATE",
                                  event.duration.colname="DURATION",
                                  event.daily.dose.colname="PERDAY",
                                  medication.class.colname="CATEGORY",
                                  carryover.within.obs.window = TRUE, # carry-over into the OW
                                  carry.only.for.same.medication = TRUE, # & only for same type
                                  consider.dosage.change = TRUE, # dosage change starts new episode...
                                  medication.change.means.new.treatment.episode = TRUE, # & type change
                                  maximum.permissible.gap = 180, # & a gap longer than 180 days
                                  maximum.permissible.gap.unit = "days", # unit for the above (days)
                                  followup.window.start = 0, # 2-years FUW starts at earliest event
                                  followup.window.start.unit = "days",
                                  followup.window.duration = 365 * 2,
                                  followup.window.duration.unit = "days",
                                  date.format = "%m/%d/%Y");
knitr::kable(TEs3, 
             caption = "<a name=\"Table-2\"></a>**Table 2.** Example output `compute.treatment.episodes()` function");
Table 2. Example output compute.treatment.episodes() function
PATIENT_ID episode.ID episode.start end.episode.gap.days episode.duration episode.end
37 1 2036-04-10 56 211 2036-11-07
37 2 2037-01-02 122 463 2038-04-10
76 1 2035-12-13 0 374 2036-12-21
76 2 2036-12-21 60 234 2037-08-12
76 3 2037-10-11 32 62 2037-12-12

The function produces a dataset as the one shown in Table 2. It includes each treatment episode for each patient (here 2 episodes for patient 37 and 3 for patient 76) and records the patient ID, episode number, date of episode start, gap days at the end of or after the treatment episode, duration of episode, and episode end date:

Notes:

  1. just the number of gap days after the end of the episode can be computed by keeping all values larger than the permissible gap and by replacing the others by 0,
  2. when medication change represents a new treatment episode, the previous episode ends when the last supply is finished (irrespective of the length of gap compared to a maximum permissible gap); any days before the date of the new medication supply are considered a gap. This maintains consistence with the computation of gaps between episodes (whether they are constructed based on the maximum permissible gap rule or the medication change rule).

This output can be used on its own to study causes and consequences of medication persistence (e.g. by using episode duration in time-to-event analyses). This function is also a basis for the CMA_per_episode class, which is described later in the vignette.

Adherence – continuous multiple interval measures of medication availability/gaps (CMA)

Let’s consider another scenario: medA and medB are alternative formulations of the same chemical molecule, and clinicians agree that they can be used by patients within the same treatment episode. In this case, both patients had a single treatment episode for the whole duration of the follow-up (Table 3). We can therefore compute adherence for any observation window (OW) within these 2 years without any concern that we might confuse quality of implementation with (non-)persistence.

# Compute the treatment episodes for the two patients
# but now a change in medication type does not start a new episode:
TEs4<- compute.treatment.episodes(ExamplePats,
                                  ID.colname="PATIENT_ID",
                                  event.date.colname="DATE",
                                  event.duration.colname="DURATION",
                                  event.daily.dose.colname="PERDAY",
                                  medication.class.colname="CATEGORY",
                                  carryover.within.obs.window = TRUE, 
                                  carry.only.for.same.medication = TRUE,
                                  consider.dosage.change = TRUE,
                                  medication.change.means.new.treatment.episode = FALSE, # here
                                  maximum.permissible.gap = 180,
                                  maximum.permissible.gap.unit = "days",
                                  followup.window.start = 0,
                                  followup.window.start.unit = "days",
                                  followup.window.duration = 365 * 2,
                                  followup.window.duration.unit = "days",
                                  date.format = "%m/%d/%Y");
# Pretty print the events:
knitr::kable(TEs4, 
             caption = "<a name=\"Table-3\"></a>**Table 3.** Alternative scenario output `compute.treatment.episodes()` function");
Table 3. Alternative scenario output compute.treatment.episodes() function
PATIENT_ID episode.ID episode.start end.episode.gap.days episode.duration episode.end
37 1 2036-04-10 122 730 2038-04-10
76 1 2035-12-13 32 730 2037-12-12

Once we clarified that we indeed measure quality of implementation and not (non)-persistence, several CMA classes can be used to compute this specific component of adherence. We will discuss first in turn the simple CMA classes, then present the more complex (or iterated) CMA_per_episode and CMA_sliding_window ones.

The simple CMAs

A first decision to consider when calculating the quality of implementation is what is the appropriate observation window – when it should start and how long it should last? We can see for example that patient 76 had some periods of regular (even overlapping) supplies, and periods when there were some large delays between consecutive medication events. Thus, estimating adherence for a whole 2-year period might be too coarse-grained to mean anything for how patients actually managed their treatment at any particular moment. As mentioned earlier in the Definitions section, EHD don’t have good granularity to start with, so we need to do the best with what we’ve got – and compressing all this information into a single estimate might not be the best solution, at least not the obvious first choice. On the other hand, due to the low granularity, we cannot target very short observation windows either because we simply don’t know what happened every day. This decision needs to be informed again by information collected from the advisory committee or qualitative/quantitative studies in the target population. It also needs to take into account the average duration of medication supply from one event, and the average time interval between two events – which can be examined in exploratory plots (Figure 1) – and the research question and design of the study. For example, if we expect that the quality of implementation reduces in time from the start of a treatment episode, medication is usually supplied for one month, and patients can take up to 4 times as much to use up their supplies, we might want to consider comparing successive 4-month OWs. If we want to examine quality of implementation 6 months before a clinical event (on the clinical assumption that how a patient takes medication in previous 6 months may impact on the probability of a health event occurring or not), we might want to consider an OW start 6 months before the event, and a 6-month duration. The posibilities here are endless, and research on the impact of different analysis choices on substantive results is still scarce. When the consensus is not reached based on the available information, one or more parametrisations can be compared – and formulated as research questions.

For demonstration purposes, let’s imagine a scenario when an adherence intervention takes place 6 months (182 days) after the start of the treatment episode, and we hypothesize that it will improve the quality of implementation in the next year (365 days) in the intervention group compared to the control group. We can specify this as followup.window.start=0, observation.window.start=182, and observation.window.duration=365 (we can of course divide this interval into shorter windows and compare the two groups in terms of longitudinal changes in adherence, as we shall see later, but for the moment let’s stick to a global 1-year estimate). We have 9 CMA classes that can produce very different estimates of the quality of implementation, the first eight have been described by Vollmer and colleagues (2012) as applied to randomized controlled trials. We implemented them in AdhereR based on the authors’ description, and in essence are defined by 4 parameters:

  1. how is the OW delimited (whether time intervals before the first event and after the last event are considered),
  2. whether CMA values are capped at 100%,
  3. whether medication oversupply is carried over to the next event interval, and
  4. whether medication available before a first event is considered in supply calculations or OW definition.

CMA1

CMA1 is the simplest method, often described in the literature as the medication possession ratio (MPR). It simply adds up the duration of all medication events within the OW, excluding the last event, and divides this by the number of days between the first and last event (multiplied by 100 to obtain a percentage). Thus, it can be higher than 1 (or 100% adherence) and, if the OW does not start and end with a medication event for all patients, it can actually refer to different lengths of time within the OW for different patients. For example, for patient 76 below CMA1 is computed for the period starting with the first event in the highlighted interval and ending at the date if the last event – thus, it considers only 4 events with considerable overlaps and results in a CMA1 of 140%, indicating overuse.

Creating an object of class CMA1 with various parameters automatically performs the estimation of CMA1 for all the patients in the dataset; moreover, the object is smart enough to allow the appropriate printing and plotting. The object includes all the parameter values with which it was created, as well as the CMA data.frame, which is the main result, with two columns: patient ID and the corresponding CMA estimate. The CMA estimates appear as ratios, but can be trivially transformed into percentages and rounded, as we did for patient 76 below (rounded to 2 decimals). The plots show the CMA as percentage rounded to 1 decimal.

# Create the CMA1 object with the given parameters:
cma1 <- CMA1(data=ExamplePats,
             ID.colname="PATIENT_ID",
             event.date.colname="DATE",
             event.duration.colname="DURATION",
             followup.window.start=0, observation.window.start=182, 
             observation.window.duration=365,
             date.format="%m/%d/%Y");
# Display the summary:
cma1
## CMA1:
##   "The ratio of days with medication available in the observation window excluding the last event; durations of all events added up and divided by number of days from first to last event, possibly resulting in a value >1.0"
##   [
##     ID.colname = PATIENT_ID
##     event.date.colname = DATE
##     event.duration.colname = DURATION
##     followup.window.start = 0
##     followup.window.start.unit = days
##     followup.window.duration = 730
##     followup.window.duration.unit = days
##     observation.window.start = 182
##     observation.window.start.unit = days
##     observation.window.duration = 365
##     observation.window.duration.unit = days
##     date.format = %m/%d/%Y
##     CMA = CMA results for 2 patients
##   ]
##   DATA: 19 (rows) x 5 (columns) [2 patients].
# Display the estimated CMA table:
cma1$CMA
##   PATIENT_ID       CMA
## 1         37 0.4035874
## 2         76 1.4000000
# and equivalently using an accessor function:
getCMA(cma1);
##   PATIENT_ID       CMA
## 1         37 0.4035874
## 2         76 1.4000000
# Compute the CMA value for patient 76, as percentage rounded at 2 digits:
round(cma1$CMA[cma1$CMA$PATIENT_ID== 76, 2]*100, 2)
## [1] 140
# Plot the CMA:
# The legend shows the actual duration, the days covered and gap days, 
# the drug (medication) type, the FUW and OW, and the estimated CMA.
plot(cma1, 
     patients.to.plot=c("76"), # plot only patient 76 
     legend.x=520); # place the legend in a nice way
<a name="Figure-2"></a>**Figure 2.** Simple CMA 1

Figure 2. Simple CMA 1

CMA2

Thus, CMA1 assumes that there is a treatment episode within the OW (shorter or equal to the OW) when the patient used the medication, and every new medication event happened when the previous supply finished (possibly due to overuse). These assumptions rarely fit with real life use patterns. One limitation is not considering the last event – which represents almost a half of the OW in the case of patient 76.

To address this limitation, CMA2 includes the duration of the last event in the numerator and the period from the last event to the end of the OW in the denominator. Thus, the estimate Figure 3 is 77.9%, more in line with the medication history of this patient in the year after the intervention.

cma2 <- CMA2(data=ExamplePats, # we're estimating CMA2 now!
             ID.colname="PATIENT_ID",
             event.date.colname="DATE",
             event.duration.colname="DURATION",
             followup.window.start=0, observation.window.start=182, 
             observation.window.duration=365,
             date.format="%m/%d/%Y");
plot(cma2, 
     patients.to.plot=c("76"),  
     show.legend=FALSE); # don't show legend to avoid clutter (see above)
<a name="Figure-3"></a>**Figure 3.** Simple CMA 2

Figure 3. Simple CMA 2

CMA3 and CMA4

Both CMA1 and CMA2 can be higher that 1 (100% adherence) based on the assumption that medication supply is finished until the last event (CMA1) or the end of the OW (CMA2). But sometimes this is not plausible, because patients can refill their supply earlier (for example when going on holidays) and overuse is a less frequent behaviour for some medications (when side effects are considerable for overuse, or medications are expensive). Or it may be that it does not matter whether patients use 100% or more that 100% of their medication, the therapeutic effect is the same with no risks or side effects. Again, this is a matter of inquiry to the advisory committee or investigation in the target population.

If it is likely that implementation does not exceed 100% (or does not make a difference if it does), CMA3 and CMA4 below adjust for this by capping CMA1 and CMA2 respectively to 100%. As shown in Figures 4 and 5, CMA3 is now capped at 100%, and CMA4 remains the same as CMA2 (because it was already lower than 100%).

cma3 <- CMA3(data=ExamplePats, # we're estimating CMA3 now!
             ID.colname="PATIENT_ID",
             event.date.colname="DATE",
             event.duration.colname="DURATION",
             followup.window.start=0, observation.window.start=182, 
             observation.window.duration=365,
             date.format="%m/%d/%Y");
plot(cma3, patients.to.plot=c("76"), show.legend=FALSE);
<a name="Figure-4"></a>**Figure 4.** Simple CMA 3

Figure 4. Simple CMA 3

cma4 <- CMA4(data=ExamplePats, # we're estimating CMA4 now!
             ID.colname="PATIENT_ID",
             event.date.colname="DATE",
             event.duration.colname="DURATION",
             followup.window.start=0, observation.window.start=182, 
             observation.window.duration=365,
             date.format="%m/%d/%Y");
plot(cma4,patients.to.plot=c("76"), show.legend=FALSE);
<a name="Figure-5"></a>**Figure 5.** Simple CMA 4

Figure 5. Simple CMA 4

CMA5 and CMA6

All CMAs from 1 to 4 have a major limitation: they don’t take into account the timing of the events. But if there is a large gap between two events it is more likely that the person had used the medication less than prescribed at least in part of that interval. Just capping the values as in CMA3 and CMA4 does not account for that likely reduction in adherence – two patients with the same quantity of supply will have the same percentage of adherence even if one has had substantial delays in supply at some points and the other supplied in time.

To adjust for this, CMA5 and CMA6 provide alternative calculations to CMA1 and CMA2 respectively. Thus, we instead calculate the number of gap days, extract it from the total time interval, and divide this value by the total time interval (first to last event in CMA5, and first event to end of OW in CMA6). By considering the gaps, we now need to decide whether to control for how any remaining supply is used when a new supply is obtained. Two additional parameters are included here: carry.only.for.same.medication and consider.dosage.change. Both are set here as FALSE, to specify the fact that carry over should always happen irrespective of what medication is supplied, and that the duration of the remaining supply should be modified if the dosage recommendations are changed with a new medication event. As shown in Figures 6 and 7, these alternative calculations do not make any difference for patient 76, because there are no gaps between the 5 events in the OW highighted. There could be, however, situations in which large gaps between some events in the OW result in lower CMA estimates when considering timing of events.

cma5 <- CMA5(data=ExamplePats, # we're estimating CMA5 now!
             ID.colname="PATIENT_ID",
             event.date.colname="DATE",
             event.duration.colname="DURATION",
             event.daily.dose.colname="PERDAY",
             medication.class.colname="CATEGORY",
             carry.only.for.same.medication=FALSE, # carry-over across medication types
             consider.dosage.change=FALSE, # don't consider canges in dosage
             followup.window.start=0, observation.window.start=182, 
             observation.window.duration=365,
             date.format="%m/%d/%Y");
plot(cma5,patients.to.plot=c("76"), show.legend=FALSE);
<a name="Figure-6"></a>**Figure 6.** Simple CMA 5

Figure 6. Simple CMA 5

cma6 <- CMA6(data=ExamplePats, # we're estimating CMA6 now!
             ID.colname="PATIENT_ID",
             event.date.colname="DATE",
             event.duration.colname="DURATION",
             event.daily.dose.colname="PERDAY",
             medication.class.colname="CATEGORY",
             carry.only.for.same.medication=FALSE,
             consider.dosage.change=FALSE,
             followup.window.start=0, observation.window.start=182, 
             observation.window.duration=365,
             date.format="%m/%d/%Y");
plot(cma6,patients.to.plot=c("76"), show.legend=FALSE);
<a name="Figure-7"></a>**Figure 7.** Simple CMA 6

Figure 7. Simple CMA 6

CMA7

All CMAs so far have another limitation: they do not consider the interval between the start of the OW and the first event within the OW. For situations in which the OW start coincides with the treatment episode start, this limitation has no consequences. But in scenarios like ours (OW starts during the episode) this has two major drowbacks. First, the time interval for calculating CMA is not the same for all patients; this can result in biases, for example if the intervention group tends to refill sooner after the intervention moment than the control group, the control group might seem more adherent but it is because CMA is calculated on a shorter time interval within the following year. And second, if there is any medication supply left from before the OW start, this is not considered (so CMA may be underestimated).

CMA7 addresses this limitation by extending the nominator to the whole OW interval, and by considering carry over both from before and within the OW. The same paremeters are available to specify whether this depends on the type of medication and considers dosage changes (applied now to both types of carry over). Figure 8 shows how considering the period at the OW start and the prior supply reduces CMA7 to 69%, due to the gap visible in the medication history plot between the event before the OW and the first event within the OW.

cma7 <- CMA7(data=ExamplePats, # we're estimating CMA7 now!
             ID.colname="PATIENT_ID",
             event.date.colname="DATE",
             event.duration.colname="DURATION",
             event.daily.dose.colname="PERDAY",
             medication.class.colname="CATEGORY",
             carry.only.for.same.medication=FALSE,
             consider.dosage.change=FALSE,
             followup.window.start=0, observation.window.start=182, 
             observation.window.duration=365,
             date.format="%m/%d/%Y");
plot(cma7, patients.to.plot=c("76"), show.legend=FALSE);
<a name="Figure-8"></a>**Figure 8.** Simple CMA 7

Figure 8. Simple CMA 7

CMA8

When entering a randomized controlled trial involving a new medication, a patient on ongoing treatment may be more likely to finish the current supply before starting the trial medication. In these situations, it may be more appropriate to consider a lagged start of the OW (even if this results in a different denominator for trial participants). Let’s consider this different scenario for patient 76: at day 374, a new treatment (medB) starts and we need to estimate CMA for the next 294 days (until the next medication change). But there is still some medA left, so it is likely that the patient finished this first. Figure 9 shows how the OW is shortened with the number of days it would have taken to finish the remaining medA (assuming use as prescribed); CMA8 is quite low 36.1%, given the long gaps between medB events. In a future version, it might be interesting to implement the possibility to also move the end of OW so that its length is preserved.

cma8 <- CMA8(data=ExamplePats, # we're estimating CMA8 now!
             ID.colname="PATIENT_ID",
             event.date.colname="DATE",
             event.duration.colname="DURATION",
             event.daily.dose.colname="PERDAY",
             medication.class.colname="CATEGORY",
             carry.only.for.same.medication=FALSE,
             consider.dosage.change=FALSE,
             followup.window.start=0, observation.window.start=374, 
             observation.window.duration=294,
             date.format="%m/%d/%Y");
plot(cma8, patients.to.plot=c("76"), show.legend=FALSE);
# The value for patient 76, rounded at 2 digits
round(cma8$CMA[cma8$CMA$PATIENT_ID== 76, 2]*100, 2);
## [1] 36.14
<a name="Figure-9"></a>**Figure 9.** Simple CMA 8

Figure 9. Simple CMA 8

CMA9

The previous 8 CMAs were described by Vollmer and colleagues (2012) in relation to randomized controlled trials, and may apply to many observational designs as well. However, they all rely on an assumption that might not hold for longitudinal cohort studies with multiple repeated measures: the medication is used as prescribed until current supply ends. In CMA7, this may introduce additional variation in adherence estimates depending on where the start of the OW is located relative to the last event before the OW and the first event within the OW: an OW start closer to the first event in the OW generates lower estimates for the same number of gap days between the two events. To address this, CMA9 first computes a ratio of days’ supply for each event in the FUW (until the next event or FUW end), then weighs all days in the OW by their corresponding ratio to generate an average CMA value for the OW.

For the same scenario as in CMA1 to CMA7, Figure 10 shows the estimate for CMA9, which is higher than for CMA7 (70.6% versus 69%). This value would be the same no matter if the OW starts slightly earlier or later, because CMA9 considers the same intervals between events (the one starting before and the one ending after the OW). Thus, it depends less on the actual date when the OW starts.

cma9 <- CMA9(data=ExamplePats, # we're estimating CMA9 now!
             ID.colname="PATIENT_ID",
             event.date.colname="DATE",
             event.duration.colname="DURATION",
             event.daily.dose.colname="PERDAY",
             medication.class.colname="CATEGORY",
             carry.only.for.same.medication=FALSE,
             consider.dosage.change=FALSE,
             followup.window.start=0, observation.window.start=182, 
             observation.window.duration=365,
             date.format="%m/%d/%Y");
plot(cma9, patients.to.plot=c("76"), show.legend=FALSE);
<a name="Figure-10"></a>**Figure 10.** Simple CMA 9

Figure 10. Simple CMA 9

The iterated CMAs

We introduce here two complex (or iterated) CMAs that share the property that they apply a given single CMA iteratively to a set of sub-periods (or windows), defined in various ways.

CMA per episode

When we calculated the persistence and implementation above, we first defined the treatment episodes, and then computed the CMAs within the episode. The CMA_per_episode class allows us to do this in one single step. In our intervention scenario, both example patients had a 2-year treatment episode and we computed the various simple CMAs for a 1-year period within this longer episode. But if we consider that medication change triggers a new treatment episode, patient 76 would have 3 episodes. CMA_per_episode can compute any of the 9 simple CMAs for all treatment episodes for all patients.

As with the simple CMAs, the CMA_per_episode class contains a list that includes all the parameter values, as well as a CMA data.frame (with all columns of the compute.treatment.episodes() output table, plus a new column with the CMA values). The CMA_per_episode values can also be transformed into percentages and rounded, as we did for patient 76 below (rounded to 2 decimals). Plots now include an extra section at the top, where each episode is shown as a horizontal bar of length equal to the episode duration, and the corresponding CMA estimates are given both as percentage (rounded to 1 decimal) and as a grey area. An extra area on the right of the plot displays the distribution of all CMA values for the whole FUW as a histogram or as smoothed kernel density (see Figure 11).

cmaE <- CMA_per_episode(CMA="CMA9", # apply the simple CMA9 to each treatment episode
                        data=ExamplePats,
                        ID.colname="PATIENT_ID",
                        event.date.colname="DATE",
                        event.duration.colname="DURATION",
                        event.daily.dose.colname="PERDAY",
                        medication.class.colname="CATEGORY",
                        carryover.within.obs.window = TRUE,
                        carry.only.for.same.medication = FALSE,
                        consider.dosage.change = FALSE, # conditions on treatment episodes
                        medication.change.means.new.treatment.episode = TRUE,
                        maximum.permissible.gap = 180,
                        maximum.permissible.gap.unit = "days",
                        followup.window.start=0,
                        followup.window.start.unit = "days",
                        followup.window.duration = 365 * 2,
                        followup.window.duration.unit = "days",
                        observation.window.start=0,
                        observation.window.start.unit = "days",
                        observation.window.duration=365*2,
                        observation.window.duration.unit = "days",
                        date.format="%m/%d/%Y",
                        parallel.backend="snow", # parallel processing speeds things up
                        parallel.threads=2);
# Summary:
cmaE;
## CMA_per_episode:
##   "CMA per treatment episode"
##   [
##     ID.colname = PATIENT_ID
##     event.date.colname = DATE
##     event.duration.colname = DURATION
##     event.daily.dose.colname = PERDAY
##     medication.class.colname = CATEGORY
##     carryover.within.obs.window = TRUE
##     carryover.into.obs.window = TRUE
##     carry.only.for.same.medication = FALSE
##     consider.dosage.change = FALSE
##     followup.window.start = 0
##     followup.window.start.unit = days
##     followup.window.duration = 730
##     followup.window.duration.unit = days
##     observation.window.start = 0
##     observation.window.start.unit = days
##     observation.window.duration = 730
##     observation.window.duration.unit = days
##     date.format = %m/%d/%Y
##     computed.CMA = CMA9
##     CMA = CMA results for 5 patients
##   ]
##   DATA: 19 (rows) x 5 (columns) [2 patients].
# The CMA estimates table:
cmaE$CMA
##   PATIENT_ID episode.ID episode.start end.episode.gap.days
## 1         37          1    2036-04-10                   56
## 2         37          2    2037-01-02                  122
## 3         76          1    2035-12-13                    0
## 4         76          2    2036-12-21                   60
## 5         76          3    2037-10-11                   32
##   episode.duration episode.end       CMA
## 1              211  2036-11-07 0.7109005
## 2              463  2038-04-10 0.3239741
## 3              374  2036-12-21 0.8422460
## 4              234  2037-08-12 0.3846154
## 5               62  2037-12-12 0.4838710
getCMA(cmaE); # as above but using accessor function
##   PATIENT_ID episode.ID episode.start end.episode.gap.days
## 1         37          1    2036-04-10                   56
## 2         37          2    2037-01-02                  122
## 3         76          1    2035-12-13                    0
## 4         76          2    2036-12-21                   60
## 5         76          3    2037-10-11                   32
##   episode.duration episode.end       CMA
## 1              211  2036-11-07 0.7109005
## 2              463  2038-04-10 0.3239741
## 3              374  2036-12-21 0.8422460
## 4              234  2037-08-12 0.3846154
## 5               62  2037-12-12 0.4838710
# The values for patient 76 only, rounded at 2 digits:
round(cmaE$CMA[cmaE$CMA$PATIENT_ID== 76, 7]*100, 2);
## [1] 84.22 38.46 48.39
# Plot:
plot(cmaE, patients.to.plot=c("76"), show.legend=FALSE);
<a name="Figure-11"></a>**Figure 11.** CMA 9 per episode

Figure 11. CMA 9 per episode

Sliding-window CMA

When discussing the issue of granularity earlier, we mentioned that estimating adherence for a whole 2-year period might be too coarse-grained to be clinically relevant, and that shorter intervals may be more appropriate, for example in studies that aim to investigate how the quality of implementation varies in time during a long-term treatment episode. In such cases, we might want to compare successive intervals, for example 4-month intervals. CMA_sliding_window allows us to compute any of the 9 simple CMAs for repeated time intervals (sliding windows) within an OW. A similar output is produced as for CMA_per_episode, including a CMA table (with patient ID, window ID, window start and end dates, and the CMA estimate). Figure 12 shows the results of CMA9 for patient 76: 6 sliding windows of 4 months, among which 2 have a CMA higher than 80%, two have values around 60% and two around 40%, suggesting a variable quality of implementation.

cmaW <- CMA_sliding_window(CMA.to.apply="CMA9", # apply the simple CMA9 to each sliding window
                           data=ExamplePats,
                           ID.colname="PATIENT_ID",
                           event.date.colname="DATE",
                           event.duration.colname="DURATION",
                           event.daily.dose.colname="PERDAY",
                           medication.class.colname="CATEGORY",
                           carry.only.for.same.medication=FALSE,
                           consider.dosage.change=FALSE,
                           followup.window.start=0,
                           observation.window.start=0,
                           observation.window.duration=365*2,
                           sliding.window.start=0, # sliding windows definition
                           sliding.window.start.unit="days",
                           sliding.window.duration=120,
                           sliding.window.duration.unit="days",
                           sliding.window.step.duration=120,
                           sliding.window.step.unit="days",
                           date.format="%m/%d/%Y",
                           parallel.backend="snow", # parallel processing speeds things up
                           parallel.threads=2);
# Summary:
cmaW;
## CMA_sliding_window:
##   "CMA per sliding window"
##   [
##     ID.colname = PATIENT_ID
##     event.date.colname = DATE
##     event.duration.colname = DURATION
##     event.daily.dose.colname = PERDAY
##     medication.class.colname = CATEGORY
##     carryover.within.obs.window = TRUE
##     carryover.into.obs.window = TRUE
##     carry.only.for.same.medication = FALSE
##     consider.dosage.change = FALSE
##     followup.window.start = 0
##     followup.window.start.unit = days
##     followup.window.duration = 730
##     followup.window.duration.unit = days
##     observation.window.start = 0
##     observation.window.start.unit = days
##     observation.window.duration = 730
##     observation.window.duration.unit = days
##     date.format = %m/%d/%Y
##     computed.CMA = CMA9
##     sliding.window.start = 0
##     sliding.window.start.unit = days
##     sliding.window.duration = 120
##     sliding.window.duration.unit = days
##     sliding.window.step.duration = 120
##     sliding.window.step.unit = days
##     CMA = CMA results for 12 patients
##   ]
##   DATA: 19 (rows) x 5 (columns) [2 patients].
# The CMA estimates table:
cmaW$CMA
##    PATIENT_ID window.ID window.start window.end       CMA
## 1          37         1   2036-04-10 2036-08-08 0.4916667
## 2          37         2   2036-08-08 2036-12-06 0.6489297
## 3          37         3   2036-12-06 2037-04-05 0.5197778
## 4          37         4   2037-04-05 2037-08-03 0.3135842
## 5          37         5   2037-08-03 2037-12-01 0.3122259
## 6          37         6   2037-12-01 2038-03-31 0.1973684
## 7          76         1   2035-12-13 2036-04-11 0.8933692
## 8          76         2   2036-04-11 2036-08-09 0.6149642
## 9          76         3   2036-08-09 2036-12-07 1.0000000
## 10         76         4   2036-12-07 2037-04-06 0.6027778
## 11         76         5   2037-04-06 2037-08-04 0.4500000
## 12         76         6   2037-08-04 2037-12-02 0.3985663
getCMA(cmaW); # as above but using accessor function
##    PATIENT_ID window.ID window.start window.end       CMA
## 1          37         1   2036-04-10 2036-08-08 0.4916667
## 2          37         2   2036-08-08 2036-12-06 0.6489297
## 3          37         3   2036-12-06 2037-04-05 0.5197778
## 4          37         4   2037-04-05 2037-08-03 0.3135842
## 5          37         5   2037-08-03 2037-12-01 0.3122259
## 6          37         6   2037-12-01 2038-03-31 0.1973684
## 7          76         1   2035-12-13 2036-04-11 0.8933692
## 8          76         2   2036-04-11 2036-08-09 0.6149642
## 9          76         3   2036-08-09 2036-12-07 1.0000000
## 10         76         4   2036-12-07 2037-04-06 0.6027778
## 11         76         5   2037-04-06 2037-08-04 0.4500000
## 12         76         6   2037-08-04 2037-12-02 0.3985663
# The values for patient 76 only, rounded at 2 digits
round(cmaW$CMA[cmaW$CMA$PATIENT_ID== 76, 5]*100, 2);
## [1]  89.34  61.50 100.00  60.28  45.00  39.86
# Plot:
plot(cmaW, patients.to.plot=c("76"), show.legend=FALSE);
<a name="Figure-12"></a>**Figure 12.** Sliding window CMA 9

Figure 12. Sliding window CMA 9

The sliding windows can also overlap, as illustrated below. This can for example be used to estimate the variation of adherence (implementation) during an episode. Figure 13 shows 21 sliding windows of 4 month for patient 76, in steps of 1 month. The patient’s quality of implementation oscillated between 37% and 100% during the 2 years of follow-up. This output can be further analyzed in relation to patterns of health status if such data are available for the same time period.

<a name="Figure-13"></a>**Figure 13.** Sliding window CMA 9

Figure 13. Sliding window CMA 9

Interactive plotting

During the exploratory phases of data analysis, it is sometimes extremely useful to be able to plot interactively various views of the data using different parameter settings. We have implemented such interactive plotting of medication histories and (simple and iterative) CMA estimates within RStudio through the plot_interactive_cma() function. This function is generic and interactive, and the most important argument is the dataset on which the plotting should be done. Currently, it uses RStudio’s manipulate library, which means that it only works within RStudio and the interface is heavily limited by this library’s capacities. but despite these apparent constraints, it is, nevertheless, a very useful and flexible tool.

After the function was called, the user can select a patient (from a drop-down list of unique patient identifiers present in the dataset), a simple CMA (1 to 9), and can also change various parameters concerning to the FUW, OW, the particular simple CMA, the treatment episodes, or the sliding windows, as appropriate. The effects of these choices are visualized in real time (see Figure 14 for a screenshot), but depending on the complexity of the computation and on the hardware, this might be more or less instantaneous. An example is given below (please note that you must run this code within RStudio and manually select the various parameters as shown in the screenshot in Figure 14):

# Interactive plotting of CMA per-treatment-episode
# Please run only within RStudio!
library(AdhereR)
plot_interactive_cma(data=med.events[med.events$PATIENT_ID %in% c(37, 76), ],
                     cma.class="per episode",
                     ID.colname="PATIENT_ID",
                     event.date.colname="DATE",
                     event.duration.colname="DURATION",
                     event.daily.dose.colname="PERDAY",
                     medication.class.colname="CATEGORY",
                     date.format="%m/%d/%Y");
Figure 14. Interactive plotting within RStudio (screenshot).

Figure 14. Interactive plotting within RStudio (screenshot).

Technical details

Here we overview some technical details, including the main S3 classes and functions (probably useful for scripting and extension), our treatment of dates and durations, and the issue of performance and parallelism (useful for large datasets).

Main S3 classes and functions

The S3 class CMA0 is the most basic object, basically encapsulating the dataset and desired parameter values; it should not be normally used directly (except for plotting the event data as such), but it is the foundation for the other classes. A CMA0 (and derived) object can print itself (the output is optimized either for text, LaTeX or Markdown), can plot itself (with various parameters controlling exactly how), and offers the accessor function getCMA() for easy access to the CMA estimate. Please note that these CMAs all work for datasets that contain more than one patient, and the estimates are computed for each patient independently, and the plotting can display more than one patient (in this case the patients are plotted on top of each other vertically), as shown in Figure 1.

The simple CMAs are implemented by the S3 classes CMA1CMA9, that are derived from CMA0 and reload its methods. Thus, one can easily implement a new simple CMA by extending the base CMA0 class.

The iterative CMAs, in contrast, are not derived from CMA0 but use internally such a simple CMA to perform their computations. For the moment, they can not be extended to new simple CMAs derived from CMA0, but, if needed, such a mechanism could be implemented.

The most important functions are:

  • compute.event.int.gaps(): for a given event database, this computes the gap days and event intervals in various scenarious, and while it should not in general be directly used, it is exported in case a use scenario requires this explicit computation;
  • compute.treatment.episodes(): this computes the treatment episodes for each patient in various scenarios;
  • getCMA(): getter functions, giving access to the estimated CMAs;
  • plot_interactive_cma(): plots interactively within RStudio (see the Interactive plotting section).

Calendar dates and durations

A potentially confusing (but very powerful and flexible) aspect of our implementation concerns our treatment of dates and durations.

First, the duration of an event is given in a column in the dataset containing, for each event, its duration (as a positive integer) in days. However, other durations (such as for FUW or the sliding windows) are given as positive integers representing the number of units; these units can be “days” (the default), “weeks”, “months”, or “years”.

The date of an event is given in a column in the dataset containing, for each event, its start date as a string (character) in the format given by the date.format parameter (by default, mm/dd/yyyy). The start of the FUW, OW and sliding windows can be given either as the number (integer) of units (“days”, “weeks”, “months”, or “years”) since the first recorded event for the patient, or as an object of class Date representing the actual calendar start date, or a string (character) giving a column name in the dataset containing, per patient, either the calendar start date as Date object (i.e., this column must be of type Date) or as the number of units if the column has type numeric. While this might be confusing, it allows greater flexibility in specifying the start dates; the most important pitfall is in passing a date as a string (type character) which will result in an error as there is no such column in the dataset – make sure it is converted to a Date object by using, for example, as.Date()! However, for most scenarios, the default of giving the number of units since the earliest event is more than enough and is the recommended (and most carefully tested) way.

Performance, parallelism and implementation

While currently implemented in pure R, we have extensively profiled and optimized our code to allow the processing of large databases even on consumer-grade hardware. For example, Table 4 below gives the running times (single-threaded and two parallel multicore threads – see below for details) for a database of 13,922 unique patients and 112,984 prescriptions of all CMAs described here, on an Apple MacBook Air 11" (7,1; early 2015) with 8Go RAM (DDR3 @ 1600MHz) and a Core i7-5650U CPU (2 cores, 4 threads with hyperthreading @ 2.20GHz, Turbo Boost to 3.10GHz), using MacOS X “El Capitan” (10.11.6), R 3.3.1 (64 bits) and RStudio 1.0.44. Table 5 below shows the running times (single-threaded and four parallel multicore threads) for a very large database of 500,000 unique patients and 4,058,110 prescriptions (generated by repeatedly concatenating the database described above and uniquely renaming the participants) of all CMAs described here, on a desktop computer with 16Go RAM and a Core i7-3770 CPU (4 cores, 8 threads with hyperthreading @ 3.40GHz, Turbo Boost to 3.90GHz), using OpenSuse 13.2 (Linux kernel 3.16.7) and R 3.3.2 (64 bits). Table 6 shows the same information as Table 5, but on a high-end desktop computer with 32Go RAM and a Core i7-4790K CPU (4 cores, 8 threads with hyperthreading @ 4.00GHz, Turbo Boost to 4.40GHz), running Windows 10 Professional 64 bits (version 1607) and R 3.2.4 (64 bits); as dicusssed below, the “multicore” backend is currently not available on Windows.

As these benchmarking results show, a database close to the median sample sizes in the literature (median 10,265 patients versus our 13,922 patients; Sattler et al., 2011) can be processed almost in real-time on a consumer laptop, while very large databases (half a million patients) require tens of minutes to a few hours on a mid-to-high end desktop computers, especially when making use of parallel processing. Interestingly, Linux seems to have a small but measurable performance advantage over Windows (despite the slightly lower-end hardware) and the “multicore” backend becomes preferable to the “snow” backend for very large databases (probably due to the data transmission and collection overheads), but not by a very large margin. Therefore, for very large databases, we recommend Linux on a multi-core/multi-CPU mechine with enough RAM and the “multicore” backend.

Table 4. Performance as running times (single- and two-threaded, multicore and snow respectively) when computing CMAs for a large database with 13,922 patients with 112,983 events on a consumer-grade MacBook Air laptop running MacOSX El Capitan. The times shown are “real” (i.e., clock) running times in seconds (as reported by R’s system.time() function) and minutes. In all cases, the FUW and OW are identical at 2 years long. CMAs per episode (with gap=180 days) and sliding window (length=180 days, step=90 days) used CMA1 for each episode/window. Please note that the multicore and snow times are slightly longer than half the single-core times due to various data transmission and collection overheads.
CMA Single-threaded Two threads (multicore) Two threads (snow)
CMA 1 40.8 (0.7) 20.8 (0.4) 22.0 (0.4)
CMA 2 41.2 (0.7) 21.7 (0.4) 24.4 (0.4)
CMA 3 39.3 (0.7) 20.4 (0.3) 22.9 (0.4)
CMA 4 40.2 (0.7) 21.3 (0.4) 23.0 (0.4)
CMA 5 56.6 (0.9) 29.7 (0.5) 31.5 (0.5)
CMA 6 58.0 (1.0) 30.9 (0.5) 32.5 (0.5)
CMA 7 55.5 (0.9) 28.9 (0.5) 30.6 (0.5)
CMA 8 131.8 (2.2) 72.5 (1.2) 71.6 (1.2)
CMA 9 159.4 (2.7) 85.2 (1.4) 86.5 (1.4)
per episode 263.9 (4.4) 139.0 (2.3) 139.7 (2.3)
sliding window 643.6 (10.7) 347.9 (5.8) 339.5 (5.7)
Table 5. Performance as running times (single- and two-threaded, multicore and snow respectively) when computing CMAs for a very large large database with 500,000 patients with 4,058,110 events on a mid/high-range consumer desktop running OpenSuse 13.2 Linux. The times shown are “real” (i.e., clock) running times in seconds (as reported by R’s system.time() function), minutes and, if large enough, hours. In all cases, the FUW and OW are identical at 2 years long. CMAs per episode (with gap=180 days) and sliding window (length=180 days, step=90 days) used CMA1 for each episode/window. Please note that the multicore and especially the snow times are slightly longer than a quarter the single-core times due to various data transmission and collection overheads.
CMA Single-threaded Four threads (multicore) Four threads (snow)
CMA 1 1839.7 (30.6) 577.0 (9.6) 755.5 (12.6)
CMA 2 1779.0 (29.7) 490.1 (8.2) 915.7 (15.3)
CMA 3 1680.6 (28.0) 458.5 (7.6) 608.3 (10.1)
CMA 4 1778.9 (30.0) 489.0 (8.2) 644.5 (10.7)
CMA 5 2500.7 (41.7) 683.3 (11.4) 866.2 (14.4)
CMA 6 2599.8 (43.3) 714.5 (11.9) 1123.8 (18.7)
CMA 7 2481.2 (41.4) 679.4 (11.3) 988.1 (16.5)
CMA 8 5998.0 (100.0 = 1.7 hours) 1558.1 (26.0) 2019.6 (33.7)
CMA 9 7039.7 (117.3 = 1.9 hours) 1894.7 (31.6) 3002.7 (50.0)
per episode 11548.5 (192.5 = 3.2 hours) 3030.5 (50.5) 3994.2 (66.6)
sliding window 27651.3 (460.8 = 7.7 hours) 7198.3 (120.0 = 2.0 hours) 12288.8 (204.8 = 3.4 hours)
Table 6. Performance as running times (single- and two-threaded, multicore and snow respectively) when computing CMAs for a very large large database with 500,000 patients with 4,058,110 events on a high-end desktop computer running Windows 10. The times shown are “real” (i.e., clock) running times in seconds (as reported by R’s system.time() function), minutes and, if large enough, hours. In all cases, the FUW and OW are identical at 2 years long. CMAs per episode (with gap=180 days) and sliding window (length=180 days, step=90 days) used CMA1 for each episode/window. Please note that the snow times are longer than a quarter the single-core times due to various data transmission and collection overheads.
CMA Single-threaded Four threads (snow)
CMA 1 2070.9 (34.5) 653.1 (10.9)
CMA 2 2098.9 (35.0) 667.5 (13.4)
CMA 3 2013.8 (33.6) 661.5 (22.0)
CMA 4 2094.4 (34.9) 685.2 (11.4)
CMA 5 2823.4 (47.1) 881.0 (14.7)
CMA 6 2909.0 (48.5) 910.3 (15.2)
CMA 7 2489.1 (41.5) 772.6 (12.9)
CMA 8 5982.5 (99.7 = 1.7 hours) 1810.1 (30.2)
CMA 9 6030.2 (100.5 = 1.7 hours) 2142.1 (35.7)
per episode 10717.1 (178.6 = 3.0 hours) 3877.2 (64.6)
sliding window 25769.5 (429.5 = 7.2 hours) 9353.6 (155.9 = 2.6 hours)

Concerning parallelism, if run on a multi-core/multi-processor machine or cluster, AdhereR gives the user the possibility to use (completely transparently) two parallel backends: multicore (available on Linux, *BSD and MacOS, but currently not on Microsoft Windows) and snow (Simple Network of Workstations, available on all platforms; in fact, this can use various types of backends, see the documentation in package snow for details). Parallelism is available through the parallel.backend and parallel.threads parameters, where the first controlls the actual backend to use (“none” – the default, uses a single thread –, “multicore”, and several versions of snow: “snow”, “snow(SOCK)”, “snow(MPI)”, “snow(NWS)”) and the second the number of desired parallel threads (“auto” defaults to the reported number of cores for “multicore” or 2 otherwise, and to 2 for “snow”) or a more complex specification of the nodes for “snow” (see the snow package documentation for details). The implementation uses mclapply (in package parallel) and parLapply (package snow), is completely hidden from the user, and tries to pre-allocate whole chunks of patients to the CPUs/cores in order to reduce the various overheads (such as data transfer). In general, for “multicore” and “snow” with nodes on the local machine, do not use more than the number of physical cores in the system, and be mindful of the various overheads involved, meaning that the gains, while substantial especially for large databases, will be very slightly lower than the expected 1/#threads (as a corrolary, it might not be a good idea to paralellize very small datasets). Also, memory might be of concern when parallelizing, as at least parts of R’s environment will be replicated across threads/processes; this, in turn, for large environments and systems low on RAM, might result in massive performance loss due to swapping (or even result in crashes). For more information on parallelism in R please see, for example, CRAN Task View: High-Performance and Parallel Computing with R and the various blogposts and books on the matter.

Conceptually, we exploited various optimization techniques (see, for example, Hadley Wickham’s Advanced R and other blogposts on profiling and optimizing R code), but the two most important architectural decisions are to (a) extensively use data.table and (b) to pre-allocate chunks of participants for parallel processing. The general framework is to define a “workhorse” function that can process a set of participants and returns one data.frame or data.table (or several, in which case they must be encapsulated in a list()), workhorse function that is transparently called for the whole dataset (if parallel.backend is “none”), or in parallel for subsets of the whole dataset of roughly 1/parallel.threads size (for “multicore” and “snow”), in the latter case the results being transparently recombined (even if multiple results are returned in a list()). Internally, the workhorse functions tend to make extensive use of the data.table “reference semantics” (the := operator) to perform in-place changes and avoid unnecessary copying of objects, keys for fast indexing, search and selection, and the by grouping mechanism, allowing the application of a specialized function to each individual patient (or episode or sliding window, as needed). We decided to keep everything “pure R” (so there is so far no C++ code) and the code is extensively commented and hopefully clear to understand, change and extend.

Conclusions

‘AdhereR’ was developed to facilitate flexible and comprehensive analyses of adherence to medication from electronic healthcare data. All objects included in this package (‘compute.treatment.episodes’, ‘CMA1’ to ‘CMA9’, and their ‘CMA_per_episode’ and CMA_sliding_window versions) can be adapted to various research questions and designs, and we provided here only a few examples of the vast range of possibilities for use. Depending on the type of medication, study population, length of follow-up, etc., the various alternative parametrizations may lead to substantial differences or negligible variation. Very little evidence is available on the impact of these choices in specific scenarios. This package makes it easy to integrate such methodological investigations into data analysis plans, and to communicate these to the scientific community.

We have also aimed to facilitate replicability. Thus, summaries of functions include all parameter values and are easily printed for transparent reporting (for example in an appendix or a supplementary online material). The calculation of adherence values via ‘AdhereR’ can also be integrated in larger data analysis scripts and made available in a data repository for future use in similar studies, freely-available or with specific access rights. This allows other research teams to use the same parametrizations (for example if studying the same type of medication in different populations), and thus increase homogeneity of studies for the benefit of later meta-analytic efforts. If these parametrizations are complemented by justifications of each decision based on clinical and/or research evidence in specific clinical areas, they can be subject to discussion and clinical consensus building and thus represent transparent and easily-implementable guidelines for EHD-based adherence research in those areas. In this situation, comparisons across medications can also take into account any differences in analysis choices, and general rules derived for adherence calculation across domains.

References

Arnet I., Kooij M.J., Messerli M., Hersberger K.E., Heerdink E.R., Bouvy M. (2016) Proposal of Standardization to Assess Adherence With Medication Records Methodology Matters. The Annals of Pharmacotherapy 50(5):360–8. doi:10.1177/1060028016634106.

Gardarsdottir H., Souverein P.C., Egberts T.C.G., Heerdink E.R. (2010) Construction of drug treatment episodes from drug-dispensing histories is influenced by the gap length. J Clin Epidemiol. 63(4):422–7. doi:10.1016/j.jclinepi.2009.07.001.

Greevy R.A., Huizinga M.M., Roumie C.L., Grijalva C.G., Murff H., Liu X., Griffin, M.R. (2011). Comparisons of Persistence and Durability Among Three Oral Antidiabetic Therapies Using Electronic Prescription-Fill Data: The Impact of Adherence Requirements and Stockpiling. Clinical Pharmacology & Therapeutics 90(6):813–819. doi:10.1038/clpt.2011.228.

Peterson A.M., Nau D.P., Cramer J.A., Benner J., Gwadry-Sridhar F., Nichol M. (2007) A checklist for medication compliance and persistence studies using retrospective databases. Value in Health: Journal of the International Society for Pharmacoeconomics and Outcomes Research 10(1):3–12. doi:10.1111/j.1524-4733.2006.00139.x.

Souverein PC, Koster ES, Colice G, van Ganse E, Chisholm A, Price D, et al. (in press) Inhaled Corticosteroid Adherence Patterns in a Longitudinal Asthma Cohort. J Allergy Clin Immunol Pract. doi:10.1016/j.jaip.2016.09.022.

Vollmer W.M., Xu M., Feldstein A., Smith D., Waterbury A., Rand C. (2012) Comparison of pharmacy-based measures of medication adherence. BMC Health Services Research 12(1):155. doi:10.1186/1472-6963-12-155.

Vrijens B., De Geest S., Hughes D.A., Przemyslaw K., Demonceau J., Ruppar T., Dobbels F., Fargher E., Morrison V., Lewek P., Matyjaszczyk M., Mshelia C., Clyne W., Aronson J.K., Urquhart J.; ABC Project Team (2012) A new taxonomy for describing and defining adherence to medications. British Journal of Clinical Pharmacology 73(5):691–705. doi:10.1111/j.1365-2125.2012.04167.x.

Van Wijk B.L.G., Klungel O.H., Heerdink E.R., de Boer A. (2006). Refill persistence with chronic medication assessed from a pharmacy database was influenced by method of calculation. Journal of Clinical Epidemiology 59(1), 11–17. doi:10.1016/j.jclinepi.2005.05.005.

Sattler E., Lee J., Perri M. (2011). Medication (Re)fill Adherence Measures Derived from Pharmacy Claims Data in Older Americans: A Review of the Literature. Drugs & Aging 30(6), 383–99. doi:10.1007/s40266-013-0074-z.